Generating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples

نویسنده

  • Zijian Zheng
چکیده

Boosting and Bagging, as two representative approaches to learning classiier committees, have demonstrated great success, especially for decision tree learning. They repeatedly build diierent classiiers using a base learning algorithm by changing the distribution of the training set. Sasc, as a diierent type of committee learning method, can also signiicantly reduce the error rate of decision trees. It generates classiier committees by stochastically modifying the set of attributes but keeping the distribution of the training set unchanged. It has been shown that Bagging and Sasc are, on average, less accurate than Boosting, but the performance of the former is more stable than that of the latter in terms of less frequently obtaining signiicantly higher error rates than the base learning algorithm. In this paper, we propose a novel committee learning algorithm, called SascBag, that combines Sasc and Bagging. It creates diierent classiiers by stochastically varying both the attribute set and the distribution of the training set. Experimental results in a representative collection of natural domains show that, for decision tree learning, the new algorithm is, on average, more accurate than Boosting, Bagging, and Sasc. It is more stable than Boosting. In addition, like Bagging and Sasc, SascBag is amenable to parallel and distributed processing while Boosting is not. This gives SascBag another advantage over Boosting for parallel machine learning and datamining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Bayesian Classifier with Ontological Attributes

The goal of inductive learning classification is to form generalizations from a set of training examples such that the classification accuracy on previously unobserved examples is maximized. Given a specific learning algorithm, it is obvious that its classification accuracy depends on the quality of training data. In learning from examples, noise is anything which obscures correlations between ...

متن کامل

Selecting representative examples and attributes by a genetic algorithm

A nearest-neighbor classifier compares an unclassified object to a set of preclassified examples and assigns to it the class of the most similar of them (the object’s nearest neighbor). In some applications, many pre-classified examples are available and comparing the object to each of them is expensive. This motivates studies of methods to remove redundant and noisy examples. Another strand of...

متن کامل

Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection

Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...

متن کامل

Using Prediction from Sentential Scope to Build a Pseudo Co-Testing Learner for Event Extraction

Event extraction involves the identification of instances of a type of event, along with their attributes and participants. Developing a training corpus by annotating events in text is very labor intensive, and so selecting informative instances to annotate can save a great deal of manual work. We present an active learning (AL) strategy, pseudo co-testing, based on one view from a classifier a...

متن کامل

Constructing Diverse Classifier Ensembles using Artificial Training Examples

Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional arti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998